Search CORE

85 research outputs found

Brute Force Information Retrieval Experiments using MapReduce

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: European Research Consortium for Informatics and Mathematics
Publication date: 01/01/2012
Field of study

MIREX (MapReduce Information Retrieval Experiments) is a software library initially developed by the Database Group of the University of Twente for running large scale information retrieval experiments on clusters of machines. MIREX has been tested on web crawls of up to half a billion web pages, totalling about 12.5 TB of data uncompressed. MIREX shows that the execution of test queries by a brute force linear scan of pages, is a viable alternative to running the test queries on a search engine’s inverted index. MIREX is open source and available for others

Radboud Repository

University of Twente Research Information

University of Twente @ TREC 2009: Indexing half a billion web pages

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2009
Field of study

This report presents results for the TREC 2009 adhoc task, the diversity task, and the relevance feedback task. We present ideas for unsupervised tuning of search system, an approach for spam removal, and the use of categories and query log information for diversifying search results

CiteSeerX

Radboud Repository

University of Twente Research Information

MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: Springer
Publication date: 01/01/2010
Field of study

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net

CiteSeerX

Crossref

Radboud Repository

University of Twente Research Information

MIREX: MapReduce Information Retrieval Experiments

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue
Publication date: 01/01/2010
Field of study

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost ma- chines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.ne

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

The Effectiveness of Concept Based Search for Video Retrieval

Author: Aly Robin
Hauff Claudia
Hiemstra Djoerd
Publication venue: Gesellschaft fuer Informatik
Publication date: 01/01/2007
Field of study

In this paper we investigate how a small number of high-level concepts\ud derived for video shots, such as Sport. Face.Indoor. etc., can be used effectively for ad hoc search in video material. We will answer the following questions: 1) Can we automatically construct concept queries from ordinary text queries? 2) What is the best way to combine evidence from single concept detectors into final search results? We evaluated algorithms for automatic concept query formulation using WordNet based concept extraction, and we evaluated algorithms for fast, on-line combination of concepts. Experimental results on data from the TREC Video 2005 workshop and 25 test users show the following. 1) Automatic query formulation through WordNet based concept extraction can achieve comparable results to user created query concepts and 2) Combination methods that take neighboring shots into account outperform more simple combination methods

CiteSeerX

Radboud Repository

University of Twente Research Information

University of Twente at GeoCLEF 2006: geofiltered document retrieval

Author: Hauff Claudia
Rode Henning
Trieschnigg Dolf
Publication venue
Publication date: 01/01/2006
Field of study

In this report we describe the approach of the University of Twente to the 2006 Geo-CLEF task. It is based on retrieval by content and the subsequent filtering by geographical relevance utilizing a gazetteer. The results do not show an improvement inretrieval performance when taking geographical information into account

Crossref

University of Twente Research Information

Scope of negation detection in sentiment analysis

Author: Dadvar Maral
Hauff Claudia
Jong Franciska de
Publication venue: University of Amsterdam
Publication date: 01/01/2011
Field of study

An important part of information-gathering behaviour has always been to find out what other people think and whether they have favourable (positive) or unfavourable (negative) opinions about the subject. This survey studies the role of negation in an opinion-oriented information-seeking system. We investigate the problem of determining the polarity of sentiments in movie reviews when negation words, such as not and hardly occur in the sentences. We examine how different negation scopes (window sizes) affect the classification accuracy. We used term frequencies to evaluate the discrimination capacity of our system with different window sizes. The results show that there is no significant difference in classification accuracy when different window sizes have been applied. However, negation detection helped to identify more opinion or sentiment carrying expressions. We conclude that traditional negation detection methods are inadequate for the task of sentiment analysis in this domain and that progress is to be made by exploiting information about how opinions are expressed implicitly

CiteSeerX

University of Twente Research Information